Search CORE

206 research outputs found

Advances in pre-processing and model generation for mass spectrometric data analysis

Author: Schleif Frank Michael
Publication venue: Dagstuhl Seminar Proceedings. 07131 - Similarity-based Clustering and its Application to Medicine and Biology
Publication date: 01/01/2007
Field of study

The analysis of complex signals as obtained by mass spectrometric measurements is complicated and needs an appropriate representation of the data. Thereby the kind of preprocessing, feature extraction as well as the used similarity measure are of particular importance. Focusing on biomarker analysis and taking the functional nature of the data into account this task is even more complicated. A new mass spectrometry tailored data preprocessing is shown, discussed and analyzed in a clinical proteom study compared to a standard setting

Dagstuhl Research Online Publication Server

Generic probabilistic prototype based classification of vectorial and proximity data

Author: Schleif Frank-michael
Publication venue: 'Elsevier BV'
Publication date: 01/04/2015
Field of study

Crossref

University of Birmingham Research Portal

Multi-perspective embedding for non-metric time series classification

Author: Heilig Simon
Münch Maximilian
Schleif Frank Michael
Publication venue: 'Universite Catholique de Louvain'
Publication date: 01/01/2021
Field of study

The interest in time series analysis is rapidly increasing, providing new challenges for machine learning. Over many decades, Dynamic Time Warping (DTW) is referred to as the de facto standard distance measure for time series and the tool of choice when analyzing such data. Nevertheless, DTW has two major drawbacks: (a) it is non-metric and therefore hard to handle by standard machine learning techniques, and (b) it is not well suited for multi-dimensional time series. For this purpose, we propose a multi-perspective embedding of the time series into a complex-valued vector space and the evaluation by a model that is able to handle complex-valued data. The approach is evaluated on various multi-dimensional time series data and with different classifier techniques

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Data-Driven Supervised Learning for Life Science Data

Author: Biehl Michael
Münch Maximilian
Raab Christoph
Schleif Frank-Michael
Publication venue: 'Frontiers Media SA'
Publication date: 06/11/2020
Field of study

Life science data are often encoded in a non-standard way by means of alpha-numeric sequences, graph representations, numerical vectors of variable length, or other formats. Domain-specific or data-driven similarity measures like alignment functions have been employed with great success. The vast majority of more complex data analysis algorithms require fixed-length vectorial input data, asking for substantial preprocessing of life science data. Data-driven measures are widely ignored in favor of simple encodings. These preprocessing steps are not always easy to perform nor particularly effective, with a potential loss of information and interpretability. We present some strategies and concepts of how to employ data-driven similarity measures in the life science context and other complex biological systems. In particular, we show how to use data-driven similarity measures effectively in standard learning algorithms

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Complex-valued embeddings of generic proximity data

Author: Biehl Michael
Münch Maximilian
Schleif Frank-Michael
Straat Michiel
Publication venue
Publication date: 31/08/2020
Field of study

Proximities are at the heart of almost all machine learning methods. If the input data are given as numerical vectors of equal lengths, euclidean distance, or a Hilbertian inner product is frequently used in modeling algorithms. In a more generic view, objects are compared by a (symmetric) similarity or dissimilarity measure, which may not obey particular mathematical properties. This renders many machine learning methods invalid, leading to convergence problems and the loss of guarantees, like generalization bounds. In many cases, the preferred dissimilarity measure is not metric, like the earth mover distance, or the similarity measure may not be a simple inner product in a Hilbert space but in its generalization a Krein space. If the input data are non-vectorial, like text sequences, proximity-based learning is used or ngram embedding techniques can be applied. Standard embeddings lead to the desired fixed-length vector encoding, but are costly and have substantial limitations in preserving the original data's full information. As an information preserving alternative, we propose a complex-valued vector embedding of proximity data. This allows suitable machine learning algorithms to use these fixed-length, complex-valued vectors for further processing. The complex-valued data can serve as an input to complex-valued machine learning algorithms. In particular, we address supervised learning and use extensions of prototype-based learning. The proposed approach is evaluated on a variety of standard benchmarks and shows strong performance compared to traditional techniques in processing non-metric or non-psd proximity data.Comment: Proximity learning, embedding, complex values, complex-valued embedding, learning vector quantizatio

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen